in this paper, we try to explore morphological rules for chinese by the corpus-based learning method 本論文從電腦處理語言的角度探討中文構(gòu)詞律的表達(dá)方式及產(chǎn)生的方法。
the derived morphological rules are intended to be used by computers to identify unknown words, i . e . words not listed in the lexicon 這些詞包括數(shù)詞,專有名詞,縮寫,復(fù)合詞,外來語等等,暫時統(tǒng)稱為未知詞。
chen keh-jiann and chen chao-jan discussed a computational approach in identifying unknown words . they tried to discover morphological rules for chinese by a corpus-based learning method . emphasis was put on the learning of rules that cannot be represented by regular expressions 陳克健和陳超然討論了識別未登錄詞的計算方法,他們試圖使用基于語料庫的機器學(xué)習(xí)方法獲取漢語的構(gòu)詞規(guī)律,并將學(xué)習(xí)的重點放在很難用“規(guī)則語法”來描述的構(gòu)詞方法。
in the framework of the generative grammar, the government and binding theory provides us evidence of the classification of these sentential types semantically and syntactically : the argument structure and epp determine the obligatory elements at the sentential level, morphological rules and the case theory motivate the production of some derived sentence types 生成語法管約論框架內(nèi)分析了劃分的語義和句法依據(jù),謂詞的題元結(jié)構(gòu)和epp決定了句子層面的必有成分,同時形態(tài)規(guī)則的作用和滿足格理論還會推導(dǎo)出一些衍生句式。
for each high frequency affix head, its morphological rules are automatically derived . the semantic types of their modifiers are obtained as the restriction on the rules for compounds . then the propriety of an unknown compound is estimated by the similarity measure between the semantic class of its morpheme and the semantic restriction posted in the respective rules 我們以復(fù)合詞為例,從語料庫中,自動抽取各類復(fù)合詞中的詞干意類,用以訓(xùn)練規(guī)則,透過比對一未知詞詞干意類與規(guī)則意類間的相似度,判斷其是否為合理的復(fù)合詞。